Back

Pharmacoepidemiology and Drug Safety

Wiley

Preprints posted in the last 90 days, ranked by how well they match Pharmacoepidemiology and Drug Safety's content profile, based on 13 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
Data Resource profile: Medicines in Acute and Chronic Care in Scotland (MACCS)

Goswami, C.; Mueller, T.; Kurdi, A.; Pearson, E. R.; Bedair, K.; Tolfrey, A.; Close, H.; Bennie, M.

2026-03-22 pharmacology and therapeutics 10.64898/2026.03.19.26348795 medRxiv
Top 0.1%
14.5%
Show abstract

BackgroundRoutinely collected prescribing and medicine-related data in Scotland are comprehensive and of high quality. However, they are generated across multiple healthcare settings and stored in complex source systems that are not optimised for longitudinal or outcomes-focused research. To maximise the research value of these data, there is a need for curated, analysis-ready resources that provide consistent representations of medicines exposure and enable linkage to clinical outcomes. The Medicines in Acute and Chronic care Scotland (MACCS) provides standardised, curated medicines data to support longitudinal analyses of medicine-related exposure across NHS healthcare systems. MethodsMACCS resource is a national individual-level medicines dataset for adults (18 years of age and older), derived from routinely collected prescribing and medicine-related data held by Public Health Scotland (PHS). It integrates data from the Hospital Electronic Prescribing and Medicines Administration (HEPMA), Prescribing Information System (PIS), and Homecare Medicines (HCM) datasets, which are linked at the individual level to eleven other national clinical records; including Scottish Morbidity Records (SMR00/01/02/04/06), laboratory data and mortality records; using the unique NHS Scotland person identifier. Data are curated, harmonised and pre-linked within the National Safe Haven and accessed by approved researchers through secure Trusted Research Environments. ResultsMACCS contains individual-level information on adults receiving NHS Scotland care, including patient demographics (such as age, sex and geographical indicators) and detailed records of medicines prescribing in community pharmacies as well as those administered in hospitals and through homecare services. Medicines-related data captures exposure dates and, where available, details on formulation, strength and dose. In addition, MACCS includes cancer registry data, renal registry data, laboratory test results, microbiology surveillance and mortality records. The earliest dates of data availability vary by source dataset. ConclusionMACCS provides a sustainable, longitudinal medicines research resource that simplifies access to complex national prescribing data and enables robust linkage to health outcomes. By supporting population-scale analyses across care settings, MACCS enhances the capacity for high-quality research to inform clinical practice, health policy, and medicines optimisation in Scotland. Key FeaturesO_LIThe Medicines in Acute and Chronic Care in Scotland (MACCS) data resource was established in 2025 to integrate medicine-related data with other electronic data from Scottish healthcare systems, creating a national, linked, routinely updated data resource at population level. C_LIO_LIMACCS provides pre-linked data from multiple routinely collected national datasets within NHS Scotland including, but not limited to, prescribing records, hospital episodes, laboratory results, and death records, within a single secure environment. C_LIO_LIMACCS includes patient demographics, data on medicines prescribing and administration/supply, key biochemistry and haematology test results (e.g., kidney and liver function tests), data on hospital admissions and surgical procedures, and date and cause of death. C_LIO_LIThe data resource provides longitudinal follow-up of the adult population ([≥]18 years of age) receiving medicines through NHS Scotland since 2010, covering approximately 4.6 million individuals, and supports pharmacoepidemiological studies, drug utilisation research, pharmacovigilance projects, as well as health services research. C_LIO_LIApproved researchers can apply through a streamlined process to access the linked MACCS data resource through established NHS Scotland governance processes, with data accessed within a Trusted Research Environment. C_LI

2
The Adverse Event Atlas and Signal Consensus Index: A Multi-Source Pharmacovigilance Platform

Bentsen, A.

2026-05-06 pharmacology and therapeutics 10.64898/2026.05.05.26352239 medRxiv
Top 0.1%
13.9%
Show abstract

BackgroundPost-market pharmacovigilance relies predominantly on single-database disproportionality analysis of spontaneous adverse event reports, which lacks corroboration across independent evidence streams and cannot integrate randomised trial evidence. No publicly accessible platform has previously combined European national pharmacovigilance registries, the US FDA Adverse Event Reporting System (FAERS), and clinical trial meta-analyses into a unified, continuously scored signal detection framework. MethodsWe describe the Signal Consensus Index (SCI), a composite 0-100 pharmacovigilance signal score integrating disproportionality evidence from the Danish National Pharmacovigilance Database, the UK MHRA Yellow Card scheme, and FAERS, with DerSimonian-Laird meta-analytic risk ratios from ClinicalTrials.gov, across 6,905,874 drug-adverse event pairs. Each source contributes a continuous score derived from the lower bounds of three complementary disproportionality metrics (ROR, PRR, IC025) for spontaneous reporting sources, and from the pooled risk ratio lower confidence bound for clinical trials. The SCI is publicly accessible via the Adverse Event Atlas (aeatlas.com). We report reference set validation against the EU-ADR reference standard, a single-source comparison with discordance characterisation, temporal stability analysis across eight cumulative data windows (2015-2023), and a weight sensitivity analysis across seven pre-specified weighting schemes. ResultsThe SCI generated 129,176 Moderate-or-Strong signals (SCI [≥] 50, confidence [≥] 50) and 7,290 Strong signals (SCI [≥] 70, confidence [≥] 70). Reference set validation against 88 classifiable drug-event pairs (44 positive controls, 44 negative controls) yielded 18 true positives, 0 false positives, 44 true negatives, and 26 false negatives (sensitivity 40.9%, specificity 100.0%, PPV 100.0%, NPV 62.9%). Zero false positives were observed across all 44 classifiable negative controls, with five false negatives attributable to the confidence gate correctly suppressing single-source signals pending multi-source corroboration. Single-source comparison demonstrated that FAERS alone generated 1,438,246 disproportionality signals, of which 94.8% were not confirmed by the SCI, while 54,184 SCI-detected signals were absent from FAERS, of which 8.3% involved drugs absent from the US reporting system. Discordance analysis showed that 99.8% of Danish non-confirmation reflected data availability constraints. Temporal stability was high: 98.5% of pairs received identical classifications across all seven weight scenarios, and 57.0% of final Strong signals were already detectable as Moderate or Strong in the earliest data window (2015-2016). Strong classifications were stable across weight scenarios (94.0% of Strong observations remaining Strong). ConclusionsThe SCI provides a transparent, openly accessible framework for cross-source pharmacovigilance signal prioritisation with 100% specificity and PPV against an established reference standard and stable classifications across weighting schemes. Progressive signal emergence through the Moderate tier supports its use as an early detection layer. The platform is available at aeatlas.com.

3
Monoclonal antibody dispensing during and around pregnancy: a descriptive analysis using electronic health records in Italy

Aiton, E.; Nazzari, V.; Cornish, R. P.; Faber, B. G.; Burden, C.; Birchenall, K.; Borges, M. C.; Lawlor, D. A.

2026-03-27 epidemiology 10.64898/2026.03.25.26349279 medRxiv
Top 0.1%
10.2%
Show abstract

Objective To describe trends in dispensing of monoclonal antibodies (mAbs) for autoimmune conditions during and around pregnancy. Design Descriptive study. Setting Lombardy, Italy between 2012 and 2024. Population All women of reproductive age (14-49 years) resident in Lombardy. Methods We described trends in mAb dispensations among women of reproductive age and the prevalence of mAb dispensing before, during and after pregnancy. We explored maternal factors associated with discontinuation. Main outcome measures Change in prescribing of mAbs over time in all women of reproductive age, and before, during and after pregnancy in those who became pregnant. Prevalence of discontinuation and switching mAbs around pregnancy. Results We included 3,049,175 women of reproductive age and 859,699 pregnancies. Prevalence of mAb dispensing during pregnancy increased over 60-fold over the study period, from 0.0041% (95%CI:0.00084, 0.012) in 2012 to 0.27% (95%CI:0.23, 0.32) in 2024. Pregnancy affected mAb dispensing, with mean prevalence decreasing from 0.080% (95%CI:0.074, 0.087) before pregnancy to 0.051% (95%CI:0.046, 0.057) by the third trimester. Over half (53.3%) of pre-existing users discontinued before or during pregnancy; discontinuation decreased over time, and varied substantially between mAbs. Switching mAbs during pregnancy was rare (3.3%). We found limited evidence that sociodemographic factors were associated with discontinuation, but that some health factors may be, such as use of assisted reproductive technology (OR=1.92, 95%CI:0.98-3.77). Conclusions Italian population-wide data from 2012-2024 show an increase in mAbs dispensed during pregnancy, and fewer instances of discontinuing these drugs over time. This may reflect recent changes in prescribing guidelines for mAbs in pregnancy.

4
TrialScout links published results to trial registrations using a large language model

Ahnström, L.; Bruckner, T.; Aspromonti, D. A.; Caquelin, L.; Cummins, J.; DeVito, N. J.; Axfors, C.; Ioannidis, J. P. A.; Nilsonne, G.

2026-03-17 epidemiology 10.64898/2026.03.15.26348383 medRxiv
Top 0.1%
8.3%
Show abstract

BackgroundMultiple stakeholders need to locate results of registered clinical trials but frequently struggle to find them. Summary results of clinical trials are often not published in trial registries, and publications containing trial results are often not explicitly linked to their respective trial registrations. Finding these results is important to researchers, systematic reviewers, research funders, regulators, clinical practitioners, and patients. MethodsWe developed TrialScout, a computer program that uses a large language model to match clinical trials registered on ClinicalTrials.gov with corresponding result publications indexed in PubMed. TrialScouts performance was evaluated through comparison to human-coded matches from previous studies of results reporting rates. Subsequently, TrialScout was applied to a random sample of 9,600 completed or terminated trials. ResultsTrialScout had a sensitivity of 92.5% and a specificity of 81.2% compared to human coders. Manual review of 200 cases where TrialScout disagreed with human researchers showed that a majority (123/200, 61.5%, 95% CI, 54.4-68.3%) of disagreements were due to human errors. When used on 9,600 sampled trials in ClinicalTrials.gov, TrialScout found result publications for 6,110 (63.6%) of trials. DiscussionTrialScout reliably located results of completed clinical trials. The tool offers benefits in terms of speed and efficiency. Estimating TrialScouts accuracy is limited by the lack of a true gold standard. TrialScout can accelerate the process of locating trial results in the scientific literature and can assist in monitoring trial reporting practices.

5
Frequency and Medical Costs of Hypersensitivity- and Anaphylaxis-Related Adverse Events for Different Intravenous Iron Products Using the US Food and Drug Administration Adverse Event Reporting System (FAERS)

Wang, Y.; Numan, S.

2026-05-01 epidemiology 10.64898/2026.04.30.26352160 medRxiv
Top 0.1%
7.3%
Show abstract

BackgroundIn the United States (US), several intravenous (IV) iron products are available for treatment of iron deficiency, including low-molecular-weight iron dextran (LMWID), iron sucrose (IS), ferumoxytol (FM), ferric carboxymaltose (FCM), ferric derisomaltose (FD), and sodium ferric gluconate (FG). However, these IV iron products are associated with rare, but serious, hypersensitivity and anaphylactic reactions. ObjectiveThis study aimed to assess the frequencies of hypersensitivity and anaphylactic reactions and associated downstream medical costs of the six IV iron products in the US. MethodsThis study used data from the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) from January 1, 2014, to June 30, 2024. The lower bound of the 90% confidence interval of the reporting odds ratio (ROR05) was used to identify a likely drug-adverse event (AE) association related to hypersensitivity and anaphylactic reactions. Downstream medical costs were estimated using Agency for Healthcare Research and Quality/Healthcare Cost and Utilization Project data. ResultsSignal strength of a likely drug-AE association for hypersensitivity was highest with FG (ROR05=9.66) and lowest with FCM (2.87). Signal strength for anaphylactic reactions was highest with FM (43.59) and lowest with FCM (6.99). The medical cost per AE was lowest with FCM (US$2,348) and highest with LMWID ($9,593). ConclusionFCM had the lowest signal strength of a likely drug-AE association for hypersensitivity and anaphylaxis and the lowest medical cost per AE in the US patient population, demonstrating its potential value by improving patient safety while lowering overall medical spending. Plain Language SummaryThis study has found that ferric carboxymaltose (FCM) had the lowest signal strength of a likely drug-adverse event (AE) association for hypersensitivity and anaphylaxis, compared to other intravenous iron products. FCM also had the lowest downstream medical cost per AE in the US patient population. These findings suggest that FCM may provide value by improving patient safety while reducing overall medical spending in the real-world setting.

6
A longitudinal cohort study comparing clinical trials registered on ClinicalTrials.gov that stopped during the first three years of the SARS-CoV-2 pandemic with trials that stopped in the three years prior

Carlisle, B. G.; Hutchinson, N.; Moyer, H.

2026-05-22 public and global health 10.64898/2026.05.20.26353581 medRxiv
Top 0.1%
6.9%
Show abstract

Background: The global SARS-CoV-2 pandemic disrupted healthcare systems worldwide, raising concerns about its impact on clinical research. Early reports suggested reductions in participant enrollment, interruptions to ongoing trials, and challenges to protocol adherence, yet the magnitude and duration of these operational disruptions remain unclear. Methods: We conducted a registry-based analysis comparing clinical trials during the COVID-19 pandemic (December 2019 to November 2022) with a matched pre-pandemic cohort (December 2016 to November 2019). Studies were included if they reported any modifications to trial status, enrollment, or protocols during the study periods. Key variables included trial stoppage, enrollment changes, and adoption of remote or hybrid procedures. Results: The global SARS-CoV-2 pandemic resulted in widespread disruptions to trial operations with 13,323 clinical trials terminated, suspended or withdrawn over the course of the pandemic, a 38% increase compared to the 9,665 trials that stopped in the 3 years prior to the pandemic. Registries indicated a sharp decline in new participant enrollment across geographic regions and therapeutic areas, with partial recovery in later months. Review findings highlighted barriers including patient inaccessibility, staff redeployment, and supply chain interruptions. Conclusions: The pandemic caused system-wide operational shocks that compromised trial timelines and may have downstream methodological consequences. Recovery in enrollment does not imply restoration of pre-pandemic protocol fidelity or outcome ascertainment. Standardized reporting of disruptions, proactive contingency planning, and resilient trial designs are needed to maintain data integrity during large-scale disruptions and to support reliable evidence generation.

7
BRIDGE: a barrier-informed Bayesian Risk prediction model for risk IDentification, trajectory Grouping, and profiling of non-adherencE to cardioprotective medicines in primary care

Koh, H. J. W.; Trin, C.; Ademi, Z.; Zomer, E.; Berkovic, D.; Cataldo Miranda, P.; Gibson, B.; Bell, J. S.; Ilomaki, J.; Liew, D.; Reid, C.; Lybrand, S.; Gasevic, D.; Earnest, A.; Gasevic, D.; Talic, S.

2026-04-22 pharmacology and therapeutics 10.64898/2026.04.21.26351387 medRxiv
Top 0.1%
6.4%
Show abstract

BackgroundNon-adherence to lipid-lowering therapy (LLT) affects up to half of patients and contributes substantially to preventable cardiovascular morbidity and mortality. Existing measures, such as the proportion of days covered, provide cross-sectional summaries but fail to capture the dynamic patterns of adherence over time. Although group-based trajectory modelling identifies distinct longitudinal adherence patterns, no approach currently predicts trajectory membership prospectively while incorporating patient-reported barriers. We developed BRIDGE, a barrier-informed Bayesian model to predict adherence trajectories and identify their underlying drivers. MethodsBRIDGE incorporates patient-reported barriers as structured prior information within a Bayesian framework for adherence-trajectory prediction. The model was designed not only to estimate which patients are likely to follow different adherence trajectories, but also to generate clinically interpretable probability estimates that help explain why those trajectories may arise and what modifiable factors may be most relevant for intervention. ResultsBRIDGE achieved a macro AUROC of 0.809 (95% CI 0.806 to 0.813), comparable to random forest (0.815 (95% CI 0.812 to 0.819)) and XGBoost (0.821 (95% CI 0.818 to 0.824)), two widely used machine-learning benchmarks for structured clinical prediction. Calibration was superior to random forest (Brier score 0.530 vs 0.545; ), and performance was stable across six independent training runs (AUROC SD = 0.003). Incorporating barrier-informed priors improved accuracy by 3.5% and calibration by 5.5% compared to flat priors, showing that incorporation of patient-reported barriers added value beyond electronic medical record data alone. Four clinically distinct adherence trajectories were identified: gradual decline associated with treatment deprioritisation amid polypharmacy (10.4%), early discontinuation linked to asymptomatic risk dismissal (40.5%), rapid decline associated with intolerance (28.8%), and persistent adherence (20.2%). Counterfactual analysis identified trajectory-specific intervention levers. ConclusionsBRIDGE provides accurate and well-calibrated prediction of adherence trajectories while offering clinically actionable insights into their underlying drivers. By integrating patient-reported barriers with routine clinical data, the model supports targeted, mechanism-informed interventions at the point of prescribing to improve adherence to cardioprotective therapies. FundingMRFF CVD Mission Grant 2017451 Evidence before this studyWe searched PubMed and Scopus from database inception to December 2025 using the terms "medication adherence", "trajectory", "prediction model", "Bayesian", "lipid-lowering therapy", and "barriers", with no language restrictions. Group-based trajectory modelling has consistently identified three to five adherence patterns across cardiovascular cohorts; however, these applications have been descriptive rather than predictive. Machine-learning models for adherence prediction achieve moderate discrimination but treat adherence as a binary or continuous outcome, thereby overlooking the clinically meaningful heterogeneity captured by trajectory approaches. One prior study applied a Bayesian dynamic linear model to examine adherence-outcome associations, but it did not predict adherence trajectories or incorporate patient-reported barriers. To our knowledge, no published model integrates patient-reported barriers into trajectory prediction. Added value of this studyBRIDGE is, to our knowledge, the first model to incorporate patient-reported adherence barriers as hierarchical domain-informed priors within a Bayesian framework for trajectory prediction. Using 108 predictors derived from routine electronic medical records, the model achieves discrimination comparable to state-of-the-art machine-learning approaches while additionally providing uncertainty quantification, barrier-level interpretability, and counterfactual insights to inform intervention strategies. The identified trajectories differed not only in adherence level but also in switching behaviour, drug-class evolution, and medication burden, suggesting distinct underlying mechanisms of non-adherence that may require tailored clinical responses. Implications of all the available evidenceEach adherence trajectory implies a distinct intervention target: asymptomatic risk communication for early discontinuers (40.5% of patients), proactive tolerability management for rapid decliners, medication simplification for patients with gradual decline associated with polypharmacy, and maintenance support for persistent adherers. By integrating routinely collected clinical data with patient-reported barriers, BRIDGE can be deployed within existing primary care EMR infrastructure to generate actionable, trajectory and patient--specific recommendations at the point of prescribing, helping to bridge the gap between adherence measurement and targeted adherence management.

8
Development of Longitudinal, Linked Maternal-Infant Cohorts using the Epic Cosmos Electronic Health Record Dataset

Leonard, S. A.; Dysart, K.; Callahan, A.; Siadat, S.; Zhang, J.; Handley, S. C.; Huybrechts, K. F.; Igbinosa, I.; Bateman, B. T.

2026-06-04 epidemiology 10.64898/2026.06.02.26354757 medRxiv
Top 0.1%
4.4%
Show abstract

Background: Epic Cosmos is a relatively new centralized electronic health record dataset with high potential utility in perinatal epidemiologic research. Objectives: The study objectives were to develop replicable steps to create longitudinal, linked maternal-infant cohorts in Cosmos, assess completeness of key variables, evaluate potential selection bias with restrictions for longitudinal healthcare encounters, and provide an example epidemiologic analysis. Methods: We created maternal-infant cohorts by starting with live births during 2023-2024 recorded in the BirthFact data table and joining with additional data tables as needed. We selected and created variables for perinatal characteristics, common comorbidities, and routinely measured vital signs and laboratory values, and assessed variable completeness. We sequentially restricted the birth cohort for maternal-infant linkage and longitudinal healthcare from first-trimester prenatal care encounter through infant follow-up care within 12 weeks post-discharge from birth hospitalization. Finally, we conducted an example analysis of the association between high systolic blood pressure in the first trimester ([≥]140 mm Hg) and later onset of preeclampsia among those with chronic hypertension. Results: The total linked birth cohort included 2,624,186 pregnancies. Completeness was >90% for most variables assessed but was 77% for racial and ethnic group and 76% for body mass index at delivery. Characteristics of the cohort were similar to those reported for the entire United States birth population based on birth certificate data, including similar regional and racial-ethnic composition. Longitudinal cohort restriction requiring linked records from first trimester prenatal care through infant follow-up care reduced the cohort size to 509,148 pregnancies. However, restriction had minimal effects on cohort characteristics. In the example analysis, high systolic blood pressure was associated with increased risk of preeclampsia among those with chronic hypertension (aRR: 1.26; 95% CI: 1.22, 1.30). Conclusions: This study provides a rigorous and reproducible approach to creating longitudinal, linked maternal-infant cohorts in Epic Cosmos and the analytical findings suggest high data quality and representativeness.

9
Added value of point-of-care testing for Group A Streptococcus in community pharmacy sore throat pathways: Analysis of the Wales Sore Throat Test and Treat service

Bustamante, Q.; Thornton, H.; Lawson, G.; Guy, R.; Ahmed, H.; Evans, A.; Cannings-John, R.; Mantzourani, E.; Jones, C.; Brown, C. S.; Hall, V.; Lamagni, T.; Mirfenderesky, M.

2026-03-19 public and global health 10.64898/2026.03.18.26347584 medRxiv
Top 0.1%
3.9%
Show abstract

ObjectiveTo evaluate the diagnostic performance of FeverPAIN and Centor with point-of-care test (POCT) results for Group A Streptococcus (GAS) among children and adults presenting with sore throat in community pharmacies. MethodsCross-sectional analysis of patients aged six years and over with sore throat presenting to community pharmacies across Wales delivering the Sore Throat Test and Treat (STTT) service from November 2018 to September 2024. Patients who scored FeverPAIN [≥]2 or Centor [≥]3 and were able to undergo POCT were eligible for analysis. We described GAS positivity by age group and assessed diagnostic performance of FeverPAIN at the National Institute for Health and Care Excellence (NICE) antibiotic threshold ([≥]4), reporting sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under receiver operating characteristic curve (AUROC) with 95% confidence intervals (CI). We estimated potential overtreatment and undertreatment if antibiotics were supplied based on FeverPAIN alone. ResultsAmong 73,617 eligible patients, 37.0% (n=27,220) tested POCT-positive for GAS. Positivity was highest in children aged 6-10 years (47.0%: 5,339/11,371). FeverPAIN was used in 92.5% (n=68,099) of assessments. At the NICE-recommended threshold for antibiotic treatment (FeverPAIN [≥]4), sensitivity was 55.0% (95% CI: 54.4-55.6%) and specificity 77.0% (95% CI: 76.6-77.4%). PPV was 57.6% (95% CI: 57.0-58.2%) and NPV 75.1% (95% CI: 74.7-75.5%). Overall AUROC was 0.70 (95% CI: 0.70-0.71), with the lowest AUROC of 0.69 (95% CI: 0.68-0.70) observed among children aged 6-10 years. Using FeverPAIN alone would undertreat 44% and overtreat 23% of patients based on POCT results. ConclusionsFeverPAIN alone showed limited diagnostic performance for identifying GAS, with more pronounced discordance observed among children. Incorporating POCTs within community pharmacy sore throat pathways may support more targeted antibiotic prescribing. Our findings support a re-evaluation of the role of POCTs within community pharmacy sore throat pathways.

10
Glycemic response trajectories on metformin monotherapy in real-world diabetes care

Raghavan, S.; Liu, W. G.; Ho, M. R.; Warsavage, T.; Ghosh, D.; Caplan, L.; Reusch, J. E.

2026-05-26 endocrinology 10.64898/2026.05.24.26353996 medRxiv
Top 0.1%
3.6%
Show abstract

Objectives: Diabetes affects over 500 million people globally and glycemia is inadequately managed. Metformin is the most frequently prescribed initial treatment for type 2 diabetes globally, yet glycemic response trajectories to metformin in routine real-world care and predictors of treatment response have not been well described. We aimed to identify glycemic response trajectories in adults prescribed metformin monotherapy as initial type 2 diabetes treatment and predictors of poor glycemic response to metformin. Design: Observational cohort study using latent class mixed models to identify hemoglobin A1c (HbA1c) trajectory classes, followed by random forests machine learning to predict trajectory class membership. Setting: US Veterans Affairs Healthcare System Participants: Adults treated with metformin alone for >30 days after diabetes diagnosis with a minimum of two HbA1c measurements from 90 days prior to two years after the first metformin prescription (N=140,413). Exposures: Demographic, laboratory, vital sign, and comorbidity data were included as predictors of metformin response trajectory Main Outcomes and Measures: We included all HbA1c measurements (487,604 total) for two years after metformin initiation to define metformin glycemic response trajectories. Results: We identified three HbA1c trajectories: stably low (89.7% of sample, mean HbA1c decrease from 7.2% to 6.6%), brisk response (7.1% of sample, mean HbA1c decrease from 11.4% to 7.0%), and non-response (3.1% of sample, mean HbA1c increase from 8.9% to 10.8%). Of those in the stably low and brisk response classes at 2 years, 91% maintained HbA1c at approximately 7% on metformin alone for 5 years after drug initiation. Prediction models could accurately predict brisk response (91% accuracy) but not metformin non-response (59% accuracy). Conclusions: Most individuals treated initially with metformin monotherapy have a beneficial and durable glycemic response. Predicting individuals who will not respond to metformin may be challenging but is evident within six months with recommended glycemic surveillance. The findings support current guidelines for HbA1c surveillance when initiating diabetes treatment.

11
Using human genetics to understand the effect of modulating targets of antihypertensive drugs in pregnancy

Borges, M. C.; Urquijo, H.; Yang, Q.; van der Graaf, A.; McBride, N.; Haug, E. B.; Soares, A. G.; Clayton, G. C.; Bond, T. A.; Al Arab, M.; Horn, J.; Thomas, L.; Bhatta, L.; Asvold, B. O.; Magnus, M. C.; Evans, D. M.; Burden, C.; Birchenall, K.; Brumpton, B.; Gaunt, T. R.; Hart, E. C.; Kutalik, Z.; Lawlor, D. A.

2026-05-20 epidemiology 10.64898/2026.05.12.26352361 medRxiv
Top 0.1%
3.6%
Show abstract

Background and Aims Hypertension during pregnancy is a major cause of maternal and neonatal morbidity and mortality, yet the efficacy and safety of antihypertensive treatments in this setting remain uncertain. We evaluated the effects of antihypertensive drug targets on adverse pregnancy-related outcomes using genetic variants to instrument target perturbation. Methods We performed drug target Mendelian randomization to mimic pharmacological perturbation of targets from six commonly used antihypertensive drug classes, using data from up to 671,922 pregnant women. Genetic variants near drug target genes associated with systolic or diastolic blood pressure were selected as instruments. We estimated effects of target modulation on six primary and eight secondary pregnancy outcomes. Results Genetically instrumented downregulation of blood pressure through beta-blocker (BB) and calcium-channel blocker (CCB) targets, particularly ADRB1 and CACNB2, was associated with a reduced risk of hypertensive disorders of pregnancy, including preeclampsia. For example, CACNB2-instrumented lowering corresponded to a 7% (95% CI: 5-9%) reduction in preeclampsia risk per 1 mmHg decrease in blood pressure. For most other targets, estimates were directionally consistent but imprecise. Across additional outcomes, effects varied by target, with suggestive evidence for reduced risks of miscarriage, preterm birth, small-for-gestational-age birth, and labour induction, although these estimates were accompanied by substantial uncertainty. Conclusions These findings support a protective effect of BB and CCB targets on hypertensive disorders of pregnancy and highlight potential target-specific differences in safety. This work illustrates the value of Mendelian randomization in addressing clinical uncertainties where robust trial evidence is limited.

12
Accounting for Uncertainty in the Null Benchmark in Two-Stage Phase II Trials

Irlmeier, R.; Jin, Z.; Ye, F.

2026-05-18 epidemiology 10.64898/2026.05.14.26353210 medRxiv
Top 0.1%
3.6%
Show abstract

Background Simon two-stage designs for binary endpoints and their time-to-event analogues, including the Kwak and Jung method, rely on a fixed null benchmark. Their Type I error control is valid only when that benchmark is correctly specified. In practice, historical benchmarks are often inconsistent due to small samples, population heterogeneity, changing eligibility criteria, and evolving standards of care. Even modest misspecifications can substantially inflate the Type I error rate, leading to costly advancement of ineffective treatments. Methods We propose the Interval-Null Robust (INR) two-stage design framework that accounts for uncertainty in the historical null benchmark. We define the null hypothesis as a plausible range of clinically uninteresting values: p[isin][p0L, p0U] for binary endpoints and {lambda}[isin][{lambda}0L, {lambda}0U] (or equivalent survival probabilities) for time-to-event endpoints. Type I error is controlled uniformly over the full null interval: sup{theta}[isin]{theta}0 Pr{theta}(Go) [≤] . Under the monotonicity of the Go probability, the supremum occurs at the least favorable null configuration - p0U and {lambda}0L - but the design is not reduced to a point-null formulation. The interval defines the uncertainty set for error control and is used in selecting among feasible designs through robust criteria such as worst-case regret or minimal average expected sample size. Results Across representative planning scenarios for both endpoint types, classic designs calibrated to a single benchmark exhibit substantial Type I error inflation when the true null parameter exceeds the assumed planning value. INR designs maintain the nominal Type I error rate across the full null interval, directly addressing this vulnerability to benchmark misspecification. The robustness-efficiency trade-off can be managed through design constraints and robust optimization criteria while preserving uniform Type I error control. Conclusions INR two-stage designs offer a transparent framework for addressing historical control uncertainty in single-arm Phase II trials. By replacing reliance on a fixed benchmark assumption with a more realistic interval of clinically plausible null values, INR design reduces the risk of false-positive Go-decisions caused by benchmark misspecification. INR applies to both binary and time-to-event endpoints and is implemented in the open-source INRDesign R package and accompanying interactive Shiny app.

13
Racial Disparities in Opioid Overdoses: A Comprehensive Claims-Based Analysis, 2020-2024

Pandey, A.

2026-05-12 addiction medicine 10.64898/2026.05.08.26352752 medRxiv
Top 0.1%
3.5%
Show abstract

PurposeOpioid overdose deaths disproportionately affect racial and ethnic minority populations in the United States, yet claims-based evidence characterizing the multi-dimensional structure of these disparities across incidence, treatment access, costs, and insurance coverage remains limited. MethodsWe conducted a retrospective cross-sectional and longitudinal cohort analysis using the HealthVerity Launch Sample, a large administrative claims database. The study population comprised 3,675,823 patients across 5 racial groups enrolled between 2020 and 2024. Eight primary analyses were conducted, including age-sex standardized overdose rates, temporal disparity trends, medication-assisted treatment (MAT) receipt, naloxone access, pharmacy costs, insurance payer type, care setting, and multivariable logistic regression for overdose risk. ResultsBlack patients had the highest age-sex standardized overdose rate (363.4 per 100,000; rate ratio [RR] = 1.27 vs. White), and those with opioid use disorder (OUD) received MAT at a rate 35% lower than White patients (19.8% vs. 30.7%; RR = 0.645), driven primarily by a buprenorphine access deficit. AIAN patients demonstrated consistent multi-dimensional disadvantage across naloxone access, MAT engagement, and pharmacy costs. After adjustment for payer type, age, and sex, all non-White groups showed lower adjusted odds of overdose than White patients (Black OR = 0.87; AIAN OR = 0.25), with Medicaid enrollment carrying 7.06 times the overdose odds of commercial insurance. ConclusionInsurance type is the dominant predictor of overdose risk, and the disproportionate Medicaid enrollment of Black patients is both a consequence of structural disadvantage and access disparities. Targeted interventions such as buprenorphine expansion in Medicaid and enhanced naloxone distribution are recommended.

14
Data Resource Profile: EST-Health-30

Reisberg, S.; Oja, M.; Mooses, K.; Tamm, S.; Sild, A.; Talvik, H.-A.; Laur, S.; Kolde, R.; Vilo, J.

2026-04-24 epidemiology 10.64898/2026.04.21.26351087 medRxiv
Top 0.1%
3.2%
Show abstract

BackgroundThe increasing availability of routinely collected health data offers new opportunities for population-level research, yet access to comprehensive, linked, and standardised datasets remains limited. We describe EST-Health-30, a large-scale, population-representative health data resource from Estonia. MethodsEST-Health-30 comprises a random 30% sample of the Estonian population (~500,000 individuals), with longitudinal data from 2012 to 2024 and annual updates planned through 2026.Individual-level records are linked across five nationwide databases, including electronic health records, health insurance claims, prescription data, cancer registry, and cause of death records. A privacy-preserving hashing approach ensures consistent cohort inclusion over time while maintaining pseudonymisation. All data are harmonised to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (version 5.4) using international standard vocabularies. Data quality was assessed using established OMOP-based validation frameworks. ResultsThe dataset contains rich multimodal information on diagnoses, procedures, laboratory measurements, prescriptions, free-text clinical notes, healthcare utilisation, and costs, with high population coverage and longitudinal depth. Data quality assessment showed high completeness and consistency, with 99.2% of applicable checks passing. The age-sex distribution closely reflects the national population, supporting representativeness, though coverage is marginally below the target 30% (29.2%), primarily attributable to recent immigrants without health system contact. The dataset enables construction of detailed clinical cohorts, analysis of disease trajectories, and evaluation of healthcare utilisation and outcomes across the life course. ConclusionsEST-Health-30 is a comprehensive, standardised, and population-representative real-world data resource that supports epidemiological, clinical, and methodological research. Its alignment with the OMOP CDM facilitates reproducible analytics and participation in international federated research networks, while secure access infrastructure ensures compliance with data protection regulations. Key featuresO_LIEST-Health-30 is a population-representative dataset of complete health records for a random 30% sample of the Estonian population (~500,000 individuals) spanning 2012-present, enabling population-level epidemiological analyses with annual updates. C_LIO_LIThe dataset is constructed using a random sampling approach based on hashed password-protected personal identifiers, ensuring consistent inclusion over time with unbiased population coverage. C_LIO_LIIndividual-level data are linked across multiple nationwide databases, including electronic health records, claims, prescriptions, cancer and cause of death registry data, enabling multimodal analyses of health trajectories. C_LIO_LIAll data are standardised to the OMOP Common Data Model (CDM) version 5.4 using international vocabularies (e.g., SNOMED CT, RxNorm, LOINC), supporting reproducibility and participation in federated research networks. C_LIO_LIThe dataset is accessible through a secure processing environment compliant with the European Health Data Space (EHDS) framework. C_LI

15
Safety and tolerability of electronic cigarettes to reduce cigarette smoking: Secondary analysis from a randomized placebo-controlled trial

Dahal, S.; Talih, S.; Hrabovsky, S.; Sciamanna, C.; Livelsberger, C.; Soule, E.; Cobb, C. O.; Yingst, J.; Foulds, J.

2026-03-20 public and global health 10.64898/2026.03.18.26348637 medRxiv
Top 0.1%
3.2%
Show abstract

Background The clinical safety profile of e-cigarette use for smoking reduction remains poorly characterized. This study compared the relative safety and tolerability of nicotine e-cigarette use with non-nicotine e-cigarettes or a non-aerosol cigarette substitute (CS) among adults interested in reducing their smoking. Methods We conducted a secondary analysis of adverse events (AEs) reported in a 6-month, double-blind RCT involving 520 participants assigned to either e-cigarettes with 0, 8, or 36 mg/mL nicotine or a CS. AEs were coded using CTCAE V4.0 and assessed for frequency, severity, seriousness and relatedness across groups. Cumulative incidence was calculated over 24 weeks. We estimated risk differences (RDs) and 95% confidence intervals (CIs) for frequently reported AEs (>=1% of participants overall) comparing e-cigarette vs. CS and nicotine versus non-nicotine e-cigarette groups. Fisher's exact test, with adjustment for multiple comparisons, was used to assess statistical significance. Results Most study-related AEs (those rated as possibly, probably, or definitely related by medical monitor) were mild in severity and none were classified as serious. At 24 weeks, cumulative incidence of first study-related AE was highest in the 36 mg/mL (37.0%) and 8 mg/mL (35.2%) e-cigarette groups, followed by 0 mg/mL (23.4%), and lowest in CS group (2.5%). E-cigarette users experienced significantly greater risks of cough (RD [95%CI]: 8.5% [5.6-11.3]), headache (RD [95%CI]: 5.4% [3.3-7.6]) and sore throat (RD [95%CI]: 5.4% [3.2-7.6]) as compared with the CS group. Cough was also more common in those randomized to nicotine versus non-nicotine e-cigarettes (RD [95%CI]: 8.1% [3.4-12.8]). Conclusion All study products were generally well-tolerated; however, AEs were more common in e-cigarette groups, especially with nicotine. Findings highlight the need to monitor common symptoms such as cough, headache, and sore throat in clinical and regulatory evaluations of e-cigarette safety.

16
Nicotine pouch adverts reach ten times more young men than women: targeting and reach on Meta social media platforms in the UK

Sun, H.; Jackson, S. E.; Xiao, L.; Cox, S.; Oldham, M.; Tattan-Birch, H. O.

2026-05-28 public and global health 10.64898/2026.05.27.26354221 medRxiv
Top 0.1%
3.1%
Show abstract

Abstract Aims To examine which demographic groups nicotine pouch advertisers chose to target on social media, and which groups Meta's algorithms actually delivered the adverts to. Design Cross-sectional analysis of advert-level data from the Meta Ad Library. Setting Meta social media platforms (including Facebook and Instagram) in the UK. Cases A random sample of 741 nicotine pouch adverts shown in the 12 months up to December 2025, and a comparison sample of 1,125 general adverts. Analyses of reach were restricted to adverts eligible for all genders and adult ages (444 pouch adverts; 674 general). Measurements Outcomes were advertiser-set gender and age-group targeting criteria (i.e., groups eligible to be shown each advert) and estimated advert reach to each group (i.e., number of people who saw each advert). Male-to-female reach ratios within age groups, and reach ratios comparing age groups, were calculated per advert and summarised using geometric means. To assess whether patterns were pouch-specific, comparisons with general adverts were made using ratios of reach ratios (RRR). Findings Advertisers of nicotine pouches targeted a broad sample; most adverts (79.1%; 586/741) were eligible to be shown to all genders, the remainder were restricted to men only. All were restricted to adults (minimum age 18 years) and most (95.6%; 708/741) had no upper age limit. Despite this, of pouch adverts eligible to be shown to all adults, adverts were more likely to reach men, particularly among younger men. Among 18-24-year-olds, pouch adverts reached around ten times as many men as women (RR 10.0, 95% CI 8.7-11.5), compared with a slight skew towards women for general adverts (RR 0.81, 95% CI 0.71-0.94), corresponding to an RRR of 12.3 (95% CI 10.0-15.1). Pouch adverts also showed a skew in reach towards younger age groups. Relative to those aged 35-44 years, reach was higher among 18-24-year-olds for nicotine pouch adverts (RR 1.33, 95% CI 1.17-1.51) but much lower for general adverts (RR 0.19, 95% CI 0.17-0.21), corresponding to an RRR of 7.0 (95% CI 6.0-8.2). Conclusions Nicotine pouch adverts on social media are often eligible to be shown broadly to all demographic groups but are disproportionately delivered to young men.

17
Assessing the Secondary Use and Scientific Impact of Shared Clinical Trial Data: A Cross-Sectional Study of Clinical Trials Shared on the YODA Project Platform

Taherifard, E.; Mooghali, M.; Hakimian, H. R.; Mane, S. R.; Fu, M.; Bamford, S.; Berlin, J. A.; Childers, K.; Desai, N. R.; Gross, C. P.; Hewens, D.; Lehman, R.; Ritchie, J. D.; Sargood, T.; Waldstreicher, J.; Wallach, J. D.; Willeford, M. K.; Krumholz, H. M.; Ross, J. S.

2026-03-26 public and global health 10.64898/2026.03.26.26349328 medRxiv
Top 0.1%
3.0%
Show abstract

ObjectiveTo assess the number, timing of publication, characteristics, and scientific impact of secondary publications generated using individual participant-level data (IPD) from a portfolio of Johnson & Johnson-sponsored clinical trials shared with external investigators through a data sharing platform. DesignCross-sectional study. SettingYale University Open Data Access (YODA) Project platform. ParticipantsJohnson & Johnson-sponsored clinical trials listed on the YODA Project platform with IPD available for external sharing as of December 31, 2021, and with a full-length, peer-reviewed publication (i.e., primary publication) reporting primary endpoint results by the original trial investigators. Main outcome measuresNumber, timing of publication, research objectives, analysis type, and scientific impact of secondary publications using IPD from these trials identified through citation searches of primary publications in Web of Science through June 2025. Scientific impact metrics included journal impact factor, annual citation count, annual Altmetric Attention Score, and annual Mendeley reader count. Secondary publications were classified as internal (authored by at least one original trial investigator) or external. ResultsAmong 336 eligible trials, 265 (78.9%) had at least one associated secondary publication, totaling 1,167 secondary publications, of which 209 (17.9%) were external. Among external secondary publications for which the data access mechanism was reported (n=190; 90.9%), most obtained access through data sharing platforms (n=161; 84.7%), primarily the YODA Project (n=157; 82.6%). All secondary publications published from 3 years before through the first 2 years after the primary publication (n=161) were internal (100%). Over time, however, external publications increased steadily, exceeding 50% of all secondary publications by year 11 and thereafter. External secondary publications were more frequently pooled analyses (151/209 [72.2%] vs 534/958 [55.7%]; P<0.001). Predictive or prognostic modelling (108/209 [51.7%] vs 322/958 [33.6%]; P<0.001), development of statistical models or algorithms (60/209 [28.7%] vs 114/958 [11.9%]; P<0.001), and validation of existing methods, models, or risk scores (32/209 [15.3%] vs 66/958 [6.9%]; P<0.001) were more frequent among external than internal secondary publications. Compared to internal secondary publications, external secondary publications were published in journals with higher impact factors (median, 6.7 [IQR, 3.4-16.6] vs 4.6 [2.9-10.2]; P=0.002) and had higher annual Altmetric Attention Scores (median, 2.1 [0.7-7.1] vs 0.6 [0.3-2.3]; P<0.001), but lower annual citation counts (median, 2.7 [1.1-5.6] vs 3.4 [1.6-7.5]; P<0.001) and were less likely to be cited in clinical guidelines (21/184 [11.4%] vs 235/805 [29.2%], P<0.001) or policy documents (14/184 [7.6%] vs 206/805 [25.6%], P<0.001); there was no difference in annual Mendeley reader counts (median, 7.4 [3.9-13.0] vs 8.0 [5.1-13.6], P=0.13). ConclusionsClinical trial data shared with external investigators through a data sharing platform generated substantial and sustained secondary research by both original trial investigators and external investigators. The proportion of secondary publications from any clinical trial generated by external investigators increased over time as external investigators pursued complementary research objectives that achieved a comparable scientific impact. Structured data sharing mechanisms may further enhance the scientific impact of clinical trials. What is already known on this topicO_LISharing individual participant-level data (IPD) from clinical trials can promote transparency, reproducibility, and secondary research. C_LIO_LISeveral initiatives, including the Yale University Open Data Access (YODA) Project and government-supported data sharing platforms, provide external investigators with access to clinical trial data. C_LIO_LIWhile prior evaluations of secondary research generated from shared clinical trial data suggest that external investigators publications have citation impacts comparable to those of original trial investigators, overall evidence remains limited. C_LI What this study addsO_LIAnalysis of 336 industry-sponsored clinical trials with IPD shared through the YODA Project showed that most generated secondary publications, by both original trial investigators and external investigators. C_LIO_LIThe proportion of secondary publications from any clinical trial generated by external investigators increased over time, and compared with those generated by the original trial investigators, these publications more frequently use pooled analyses and focus on predictive or prognostic modelling and the development and validation of statistical methods. C_LIO_LISecondary publications generated by external investigators were more often published in higher-impact journals and received higher Altmetric Attention Scores, but had lower annual citation counts and were less likely to be cited in clinical guidelines or policy documents than those generated by the original trial investigators. C_LI

18
Real-World Weight Loss and Telehealth Platform Utilization Patterns of Long Term GLP-1 Receptor Agonist Treatment of self pay patients : A Retrospective Analysis

Patil, P.; Durvasula, R.; Patel, S.; Malik, M.; Patil, S.

2026-03-30 public and global health 10.64898/2026.03.27.26349009 medRxiv
Top 0.1%
2.9%
Show abstract

Importance: Glucagon like peptide 1 receptor agonists (GLP 1 RAs) and dual glucose dependent insulinotropic polypeptide/glucagon like peptide 1 receptor agonists have demonstrated what may be considered transformative efficacy in recent randomized clinical trials for the treatment of obesity, yielding substantial weight loss in a majority of participants. However, the extent to which these trial results translate into routine clinical practice particularly within the rapidly expanding direct to consumer (DTC) telehealth sector serving self pay populations remains insufficiently characterized. As access to and affordability of these therapies broaden beyond traditional insurance based care models, evaluating real world effectiveness, safety, and patient engagement among individuals shouldering the full financial cost of treatment is essential for informing future models of obesity care delivery. Objective:To assess long term medication specific weight loss outcomes, including gender specific responses and discrepancies, and explore usage trends in a real world, self pay telehealth cohort receiving GLP 1 RA therapy, using an Observational study design (Retrospective data analysis). Setting and Participants:Retrospective data of patients enrolled in electronic health records (EHR) from Carevalidate, a national US telehealth platform provider for Online TeleHealth companies. The data collected ranged for a total of 703 days from January 12, 2024, to December 15, 2025. The analysis included 572 adults with overweight or obesity diagnosis who initiated treatment with semaglutide or tirzepatide and completed a minimum of 9 months of active follow up. Patients with insufficient follow up or those utilizing insurance coverage were excluded to isolate the self pay phenotype. Exposures: Prescription of semaglutide or tirzepatide (injectable or oral formulations) via synchronous or asynchronous telehealth consultations, titrated according to standard clinical protocols adapted for patient tolerance and financial sustainability. Main Outcomes and Measures: The primary outcome was percentage total body weight loss (%TBWL) from baseline to the last recorded encounter. Secondary outcomes included categorical responder rates (5%, 10%, 15%, >20% weight loss), weight loss velocity analysis, and telehealth utilization metrics (frequency of encounters and visit intervals) including gender differences in approaching the telehealth program. Results: The final analytical cohort included 572 patients (79.2% female; 20.8% male). Overall, 95.8% (548/572) achieved weight loss, while 3.7% experienced weight gain. At 12 months, the mean %TBWL was 13.8% for the semaglutide cohort (n=450) and 12.5% for the tirzepatide cohort (n=122), with no statistically significant difference between the two medications (P >.05), contrary to standard clinical trial data suggesting tirzepatide superiority. A significant gender difference was observed: females were significantly more in number comprising 80% of the cohort and were likely to be "major responders" (>20% weight loss) compared to males (29.8% vs 5.9%; P <.001). Conversely, males demonstrated significantly higher utilisation rates, attending more frequent encounters (mean 13.5 vs 12.7; P =.028) with shorter intervals between visits (35.6 vs 44.1 days; P =.009) compared to females. Weight loss velocity for both medications peaked during months 1 to 3 (~1.07 lbs/week) and declined substantially by months 12 to 15, indicating a plateau effect independent of the specific agent used. Conclusions and Relevance: Telehealth-managed GLP 1 treatment in a self pay population demonstrates high efficacy comparable to clinical trials for semaglutide. However, tirzepatide outcomes fell short of trial benchmarks, likely due to economic barriers preventing optimal dose titration and lower sample size. The study identifies a discrepancy where females approach the telehealth based self pay system more but males engage more frequently with the digital platform which could be due to inferior physiological outcomes ( less weight loss and more non responders) compared to females.This suggests that while telehealth is a viable model for long term obesity care, the "one size fits all" approach may be insufficient for under responders, who may require distinct titration strategies or tailored behavioral interventions to overcome baseline genetic and biological resistance.

19
Validation of an AI-Assisted Framework for Systematic Bias Assessment in Observational Studies

Etminan, M.; Rezaeianzadeh, R.; Douros, A.

2026-04-28 epidemiology 10.64898/2026.04.26.26351778 medRxiv
Top 0.1%
2.6%
Show abstract

BackgroundThe rapid expansion of medical literature has led to substantial variability and frequent contradictions in study findings, making it increasingly difficult to distinguish meaningful signals from noise. Much of this variability arises from differences in study methodology, where biases such as confounding, selection bias, and reverse causation can drive spurious associations. While artificial intelligence (AI)-assisted tools have been developed to support risk-of-bias assessment, most are designed for systematic reviews and are not tailored to identifying specific epidemiologic biases in observational studies. This highlights the need for structured, scalable approaches to evaluate study validity in real-world evidence. ObjectiveTo develop and validate an AI-assisted, expert-informed, rule-based framework (EpiVise) for systematically identifying and classifying key sources of bias in pharmacoepidemiologic studies, and to assess its agreement with expert evaluation. MethodsWe conducted a validation study using recently published pharmacoepidemiologic studies from high-impact journals (post-2025). Each study was independently assessed by the framework and two expert epidemiologists, across predefined bias domains, including measured confounding, confounding by indication, selection bias, immortal time bias, and disease latency. Agreement was evaluated using weighted kappa statistics. In the absence of a gold standard, expert judgment served as the reference benchmark. In a second phase, synthetic study scenarios with predefined embedded biases were constructed to assess the frameworks ability to detect known bias structures under controlled conditions. ResultsIn analyses of published studies (10 studies; 60 ratings), agreement between the framework and expert assessments was substantial ({kappa} = 0.75; 95% confidence interval [CI], 0.60-0.86), with 12 discordant ratings (20.0%), all limited to adjacent categories and occurring primarily in the confounding by indication and selection bias domains. In synthetic study scenarios (10 studies; 50 ratings), agreement was similarly substantial, with 42 of 50 ratings concordant (84%) and a weighted kappa of 0.77 (95% CI, 0.67-0.87); discordances included both adjacent-category and extreme disagreements and were concentrated in confounding by indication, selection bias, and prevalent user bias domains. ConclusionsThis AI-assisted, expert-informed framework, EpiVise provides a scalable and reproducible approach for evaluating epidemiologic study validity, substantial demonstrating agreement comparable to expert assessment. By systematically identifying key sources of bias, the framework has the potential to enhance the rigor and consistency of evidence evaluation, support peer review, and inform clinical, regulatory, and policy decision-making. Further validation across broader study designs and domains is warranted.

20
Clinical Safety of AI-Generated Antibiotic Prescribing Advice: Guideline Adherence and Misinformation Risk Among Large Language Models

Khan, M. M.; Anwar, M. N.

2026-05-15 public and global health 10.64898/2026.05.13.26352828 medRxiv
Top 0.1%
2.4%
Show abstract

Background: Large language models (LLMs) are increasingly used in telehealth, but their safety in antibiotic prescribing remains uncertain, particularly in the presence of patient misinformation. Methods: A cross-sectional analytical study evaluated 5,000 responses from five chatbot models using 1,000 primary-care vignettes of mild infections. Guideline adherence, overprescribing, misinformation effects, and safety behaviors were assessed. Inappropriate prescriptions were classified using the WHO AWaRe framework. Results: Overall, 76.2% of responses were guideline-concordant, while 6.6% showed unprompted overprescribing and 17.2% were influenced by misinformation. Some models were more vulnerable to misinformation than others. Although most responses correctly noted that antibiotics do not treat viral infections, fewer advised consulting a doctor, and warnings against self-medication were rare. Many inappropriate prescriptions involved broad-spectrum antibiotics. Conclusion: LLMs show potential in telehealth but remain prone to misinformation and inappropriate prescribing. Stronger guideline integration and clinical oversight are necessary to ensure safe use. Keywords: antimicrobial stewardship; large language models; telehealth; antibiotic prescribing; misinformation; clinical safety